Search CORE

11 research outputs found

Online Bipartite Matching with Decomposable Weights

Author: Huy L. Nguy
Monika Henzinger
Moses Charikar
Publication venue
Publication date: 01/01/2014
Field of study

We study a weighted online bipartite matching problem: G(V1, V2, E) is a weighted bipartite graph where V1 is known beforehand and the vertices of V2 arrive online. The goal is to match vertices of V2 as they arrive to vertices in V1, so as to maximize the sum of weights of edges in the matching. If assignments to V1 cannot be changed, no bounded competitive ratio is achievable. We study the weighted online matching problem with free disposal, where vertices in V1 can be assigned multiple times, but only get credit for the maximum weight edge assigned to them over the course of the algorithm. For this problem, the greedy algorithm is 0.5-competitive and determining whether a better competitive ratio is achievable is a well known open problem. We identify an interesting special case where the edge weights are decomposable as the product of two factors, one corresponding to each end point of the edge. This is analogous to the well studied related machines model in the scheduling literature, although the objective functions are different. For this case of decomposable edge weights, we design a 0.5664 competitive randomized algorithm in complete bipartite graphs. We show that such instances with decomposable weights are non-trivial by establishing upper bounds of 0.618 for deterministic and 0.8 for randomized algorithms. A tight competitive ratio of 1 − 1/e ≈ 0.632 was known previously for both the 0-1 case as well as the case where edge weights depend on the offline vertices only, but for these cases, reassignments cannot change the quality of the solution. Beating 0.5 for weighted matching where reassignments are necessary has been a significant challenge. We thus give the first online algorithm with competitive ratio strictly better than 0.5 for a non-trivial case of weighted matching with free disposal.

arXiv.org e-Print Archive

CiteSeerX

Crossref

Sparsity lower bounds for dimensionality reducing maps

Author: Huy L. Nguy
Jelani Nelson
Publication venue
Publication date: 05/11/2012
Field of study

We give near-tight lower bounds for the sparsity required in several dimensionality reducing linear maps. First, consider the Johnson-Lindenstrauss (JL) lemma which states that for any set of n vectors in R d there is a matrix A ∈ R m×d with m = O(ε −2 log n) such that mapping by A preserves pairwise Euclidean distances of these n vectors up to a 1±ε factor. We show that there exists a set of n vectors such that any such matrix A with at most s non-zero entries per column must have s = Ω(ε −1 log n / log(1/ε)) as long as m < O(n / log(1/ε)). This bound improves the lower bound of Ω(min{ε −2, ε −1 √ log m d}) by [Dasgupta-Kumar-Sarlós, STOC 2010], which only held against the stronger property of distributional JL, and only against a certain restricted class of distributions. Meanwhile our lower bound is against the JL lemma itself, with no restrictions. Our lower bound matches the sparse Johnson-Lindenstrauss upper bound of [Kane-Nelson, SODA 2012] up to an O(log(1/ε)) factor. Next, we show that any m×n matrix with the k-restricted isometry property (RIP) with constant distortion must have at least Ω(k log(n/k)) non-zeroes per column if m = O(k log(n/k)), the optimal number of rows of RIP matrices, and k < n / polylog n. This improves the previous lower bound of Ω(min{k, n/m}) by [Chandar, 2010] and shows that for virtually all k it is impossible to have a sparse RIP matrix with an optimal number of rows. Both lower bounds above also offer a tradeoff between sparsity and the number of rows. Lastly, we show that any oblivious distribution over subspace embedding matrices with 1 non-zero per column and preserving distances in a d dimensional-subspace up to a constant factor must have at least Ω(d2) rows. This matches one of the upper bounds in [Nelson-Nguy˜ên, 2012] and shows the impossibility of obtaining the best of both of constructions in that work, namely 1 non-zero per column and Õ(d) rows.

arXiv.org e-Print Archive

CiteSeerX

Crossref

Cutting corners cheaply, or how to remove Steiner points

Author: Huy L. Nguy
Lior Kamma
Robert Krauthgamer
Publication venue
Publication date: 01/01/2013
Field of study

Our main result is that the Steiner Point Removal (SPR) problem can always be solved with polylogarithmic distortion, which resolves in the affirmative a question posed by Chan, Xia, Konjevod, and Richa (2006). Specifically, we prove that for every edge-weighted graph G = (V, E, w) and a subset of terminals T ⊆ V, there is a graph G ′ = (T, E ′, w ′) that is isomorphic to a minor of G, such that for every two terminals u, v ∈ T, the shortest-path distances between them in G and in G ′ satisfy dG,w(u, v) ≤ dG ′,w′(u, v) ≤ O(log6 |T |)·dG,w(u, v). Our existence proof actually gives a randomized polynomial-time algorithm. Our proof features a new variant of metric decomposition. It is well-known that every finite metric space (X, d) admits a β-separating decomposition for β = O(log|X|), which roughly means for every desired diameter bound ∆> 0 there is a randomized partitioning of X, which satisfies the following separation requirement: for every x, y ∈ X, the probability they lie in different clusters of the partition is at most β d(x, y)/∆. We introduce an additional requirement, which is the following tail bound: for every shortest-path P of length d(P) ≤ ∆/β, the number of clusters of the partition that meet the path P, denoted ZP, satisfies Pr[ZP> t] ≤ 2e−Ω(t) for all t> 0

arXiv.org e-Print Archive

CiteSeerX

Crossref

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Author: David P. Woodruff
Huy L. Nguy˜ên
Yi Li
Publication venue
Publication date: 01/01/2014
Field of study

In the turnstile model of data streams, an underlying vector x ∈ {−m, −m+1,..., m−1, m} n is presented as a long sequence of positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function f(x) with constant probability while only making a single pass over this sequence of updates and using a small amount of space. All known algorithms in this model are linear sketches: they sample a matrix A from a distribution on integer matrices in the preprocessing phase, and maintain the linear sketch A · x while processing the stream. At the end of the stream, they output an arbitrary function of A · x. One cannot help but ask: are linear sketches universal? In this work we answer this question by showing that any 1-pass constant probability streaming algorithm for approximating an arbitrary function f of x in the turnstile model can also be implemented by sampling a matrix A from the uniform distribution on O(n log m) integer matrices, with entries of magnitude poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of the number of possible states of Ax, as x ranges over {−m, −m + 1,..., m} n, plus the amount of randomness needed to store A, is at most a logarithmic factor larger than the space required of the space-optimal algorithm. Our result shows that to prove space lower bounds for 1-pass streaming algorithms, it suffices to prove lower bounds in the simultaneous model of communication complexity, rather than the stronger 1-way model. Moreover, the fact that we can assume we have a linear sketch with polynomially-bounded entries further simplifies existing lower bounds, e.g., for frequency moments we present a simpler proof of the ˜Ω(n 1−2/k) bit complexity lower bound without using communication complexity

CiteSeerX

MPG.PuRe

On Sketching Matrix Norms and the Top Singular Vector

Author: David P. Woodruff
Huy L. Nguy Ên
Yi Li
Publication venue
Publication date: 01/01/2014
Field of study

Sketching is a prominent algorithmic tool for processing large data. In this paper, we study the problem of sketching matrix norms. We consider two sketching models. The first is bilinear sketching, in which there is a distribution over pairs of r × n matrices S and n × s matrices T such that for any fixed n × n matrix A, from S · A · T one can approximate ‖A‖p up to an approximation factor α ≥ 1 with constant probability, where ‖A‖p is a matrix norm. The second is general linear sketching, in which there is a distribution over linear maps L: R n2 → R k, such that for any fixed n × n matrix A, interpreting it as a vector in R n2, from L(A) one can approximate ‖A‖p up to a factor α. We study some of the most frequently occurring matrix norms, which correspond to Schatten p-norms for p ∈ {0, 1, 2, ∞}. The p-th Schatten norm of a rank-r matrix A is defined to be ‖A‖p = ( ∑r i=1 σp i)1/p, where σ1,..., σr are the singular values of A. When p = 0, ‖A‖0 is defined to be the rank of A. The cases p = 1, 2, and ∞ correspond to the trace, Frobenius, and operator norms, respectively. For bilinear sketches we show: 1. For p = ∞ any sketch must have r · s = Ω(n 2 /α 4) dimensions. This matches an upper bound of Andoni and Nguyen (SODA, 2013), and implies one cannot approximate the top right singular vector v of A by a vector v ′ with ‖v ′ − v‖2 ≤ 1 2 with r · s = õ(n2). 2. For p ∈ {0, 1} and constant α, any sketch must have r · s ≥ n 1−ɛ dimensions, for arbitrarily small constant ɛ> 0. 3. For even integers p ≥ 2, we give a sketch with r · s = O(n 2−4/p ɛ −2) dimensions for obtaining a (1 + ɛ)approximation. This is optimal up to logarithmic factors, and is the first general subquadratic upper bound for sketching the Schatten norms. For general linear sketches our results, though not optimal, are qualitatively similar, showing that for p = ∞, k = Ω(n 3/2 /α 4) and for p ∈ {0, 1}, k = Ω ( √ n). These give separations in the sketching complexity of Schatten-p norms with the corresponding vector p-norms, and rule out a table lookup nearest-neighbor search for p = 1, making progress on a question of Andoni.

CiteSeerX

Crossref

MPG.PuRe

c ○ 2014 Society for Industrial and Applied Mathematics PRESERVING TERMINAL DISTANCES USING MINORS ∗

Author: Huy L. Nguy Ên
Robert Krauthgamer
Tamar Zondiner
Publication venue
Publication date
Field of study

Abstract. We introduce the following notion of compressing an undirected graph G with (nonnegative) edge-lengths and terminal vertices R ⊆ V (G). A distance-preserving minor is a minor G ′ (of G) with possibly different edge-lengths, such that R ⊆ V (G′) and the shortest-path distance between every pair of terminals is exactly the same in G and in G ′. We ask: what is the smallest f ∗ (k) such that every graph G with k = |R | terminals admits a distance-preserving minor G ′ with at most f ∗ (k) vertices? Simple analysis shows that f ∗ (k) ≤ O(k4). Our main result proves that f ∗ (k) ≥ Ω(k2), significantly improving on the trivial f ∗ (k) ≥ k. Our lower bound holds even for planar graphs G, in contrast to graphs G of constant treewidth, for which we prove that O(k) vertices suffice. distance-preserving minor, graph compression, vertex-sparsification, metric em-Key words. beddin

CiteSeerX

Beyond Locality–Sensitive Hashing

Author: Alexandr Andoni
Huy L. Nguy Ên
Ilya Razenshteyn
Piotr Indyk
Publication venue
Publication date: 08/10/2013
Field of study

We present a new data structure for the c–approximate near neighbor problem (ANN) in the Euclidean space. For n points in Rd, our algorithm achieves Oc(dnρ) query time and Oc(n1+ρ + nd) space, where ρ ≤ 7/(8c2) + O(1/c3) + oc(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality–sensitive hashing lower bound proved by O’Donnell, Wu and Zhou (ITCS 2011). By a standard reduction we obtain a data structure for the Hamming space and ℓ1 norm with ρ ≤ 7/(8c) + O(1/c3/2) + oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref